Goto

Collaborating Authors

 language template


A Joint Modeling of Vision-Language-Action for Target-oriented Grasping in Clutter

Xu, Kechun, Zhao, Shuqi, Zhou, Zhongxiang, Li, Zizhang, Pi, Huaijin, Zhu, Yifeng, Wang, Yue, Xiong, Rong

arXiv.org Artificial Intelligence

We focus on the task of language-conditioned grasping in clutter, in which a robot is supposed to grasp the target object based on a language instruction. Previous works separately conduct visual grounding to localize the target object, and generate a grasp for that object. However, these works require object labels or visual attributes for grounding, which calls for handcrafted rules in planner and restricts the range of language instructions. In this paper, we propose to jointly model vision, language and action with object-centric representation. Our method is applicable under more flexible language instructions, and not limited by visual grounding error. Besides, by utilizing the powerful priors from the pre-trained multi-modal model and grasp model, sample efficiency is effectively improved and the sim2real problem is relived without additional data for transfer. A series of experiments carried out in simulation and real world indicate that our method can achieve better task success rate by less times of motion under more flexible language instructions. Moreover, our method is capable of generalizing better to scenarios with unseen objects and language instructions. Our code is available at https://github.com/xukechun/Vision-Language-Grasping


Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters

Wang, Boshi, Min, Sewon, Deng, Xiang, Shen, Jiaming, Wu, You, Zettlemoyer, Luke, Sun, Huan

arXiv.org Artificial Intelligence

Chain-of-Thought (CoT) prompting can dramatically improve the multi-step reasoning abilities of large language models (LLMs). CoT explicitly encourages the LLM to generate intermediate rationales for solving a problem, by providing a series of reasoning steps in the demonstrations. Despite its success, there is still little understanding of what makes CoT prompting effective and which aspects of the demonstrated reasoning steps contribute to its performance. In this paper, we show that CoT reasoning is possible even with invalid demonstrations - prompting with invalid reasoning steps can achieve over 80-90% of the performance obtained using CoT under various metrics, while still generating coherent lines of reasoning during inference. Further experiments show that other aspects of the rationales, such as being relevant to the query and correctly ordering the reasoning steps, are much more important for effective CoT reasoning. Overall, these findings both deepen our understanding of CoT prompting, and open up new questions regarding LLMs' capability to learn to reason in context.


DPCL: a Language Template for Normative Specifications

Sileno, Giovanni, van Binsbergen, Thomas, Pascucci, Matteo, van Engers, Tom

arXiv.org Artificial Intelligence

Several solutions for specifying normative artefacts (norms, contracts, policies) in a computational processable way have been presented in the literature. Legal core ontologies have been proposed to systematize concepts and relationships relevant to normative reasoning. However, no solution amongst those has achieved general acceptance, and no common ground (representational, computational) has been identified enabling us to easily compare them. Yet, all these efforts share the same motivation of representing normative directives, therefore it is plausible that there may be a representational model encompassing all of them. This presentation will introduce DPCL, a domain-specific language (DSL) for specifying higher-level policies (including norms, contracts, etc.), centred on Hohfeld's framework of fundamental legal concepts. DPCL has to be seen primarily as a "template", i.e. as an informational model for architectural reference, rather than a fully-fledged formal language; it aims to make explicit the general requirements that should be expected in a language for norm specification. In this respect, it goes rather in the direction of legal core ontologies, but differently from those, our proposal aims to keep the character of a DSL, rather than a set of axioms in a logical framework: it is meant to be cross-compiled to underlying languages/tools adequate to the type of target application. We provide here an overview of some of the language features.


Towards Personalized Explanation of Robotic Planning via User Feedback

Boggess, Kayla, Chen, Shenghui, Feng, Lu

arXiv.org Artificial Intelligence

Prior studies have found that providing explanations about robots' decisions and actions help to improve system transparency, increase human users' trust of robots, and enable effective human-robot collaboration. Different users have various preferences about what should be included in explanations. However, little research has been conducted for the generation of personalized explanations. In this paper, we present a system for generating personalized explanations of robotic planning via user feedback. We consider robotic planning using Markov decision processes (MDPs) and develop an algorithm to automatically generate a personalized explanation of an optimal robotic plan (i.e., an optimal MDP policy) based on the user preference regarding four elements (i.e., objective, locality, specificity, and abstraction). In addition, we design the system to interact with users via answering users' further questions about the generated explanations. Users have the option to update their preferences to view different explanations. The system is capable of detecting and resolving any preference conflict via user interaction. Our user study results show that the generated personalized explanations improve user satisfaction, while the majority of users liked the system's capabilities of question-answering, and conflict detection and resolution.